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FOREWORD 


The  SURVEILLANCE  SYSTEMS  research  program  of  the  U.  S.  Army  Behavioral  Science 
Research  Laboratory  has  as  its  objective  the  production  of  scientific  data  bearing  on  the 
extraction  of  information  from  surveillance  displays,  and  the  efficient  storage,  retrieval, 
and  transmission  of  this  information  within  an  advanced  computerized  image  interpreta¬ 
tion  facility.  Research  results  are  used  in  future  systems  design  and  in  the  development 
of  enhanced  techniques  for  all  phases  of  the  interpretation  process.  Research  is  con¬ 
ducted  under  Army  RDT&E  Project  No.  2Q662704A721 ,  "Surveillance  Systems;  Ground 
Surveillance  and  Target  Acquisition  Interpreter  Techniques,"  FY  1969  Work  Program. 

The  development  of  procedures  to  maintain  and  improve  the  proficiency  of  image  in¬ 
terpreters  within  an  image  interpretation  facility  is  one  of  the  major  objectives  of  the 
Work  Unit,  "Information  Processing  in  Advanced  Image  Interpretation  SystemS"IMAGE 
SYSTEMS."  The  present  publication  reports  on  one  aspect  of  assigning  interpreters  to 
work  as  two-man  teams  in  which  one  interpreter  checks  interpretations  made  indepen¬ 
dently  by  his  teammate.  The  study  concentrates  on  the  checker's  statements  of  his  con¬ 
fidence  in  identifications  as  affected  by  varying  levels  of  identification  accuracy  and 
confidence  validity  on  the  part  of  the  initial  interpreter. 


J.  E.  UHLANER,  Director 
U.  S.  Army  Behavioral  Science 
Research  Laboratory 


Requirement: 


Research  to  develop  effective  team  procedures  for  image  interpretation  requires 
study  of  the  type  and  amount  of  information  exchanged  among  team  members.  The  objec¬ 
tive  of  the  present  study  was  to  determine  how  an  initial  interpreter's  accuracy  of  iden¬ 
tification  and  validity  of  stated  confidence  in  his  identifications  affect  the  usefulness 
of  the  checker's  confidence  statements. 


Procedure: 

Identifications  of  60  annotated  targets  and  associated  confidence  statements  were 
obtained  from  18  newly  trained  image  interpreters.  Confidence  estimates  were  stated 
under  a  point  payoff  scheme  in  which  it  was  to  the  disadvantage  of  the  interpreter  to 
overstate  or  understate  his  confidence.  Half  of  the  interpreters  were  given  individual 
performance  feedback.  Interpreters  were  then  presented  with  three  sets  of  60  annotated 
images  to  which  identifications  and  confidence  statements  attributed  to  an  initial  check¬ 
er  were  attached.  The  information  provided  incorporated  'hree  levels  of  identificetion 
accuracy  and  three  levels  of  confidence  validity,  arranged  according  to  a  Graeco-Latin 
square  research  design.  The  task  of  the  interpreter  was  to  examine  each  annotation,  note 
the  previous  identification  and  confidence  statement,  and  then  state  his  own  confidence 
in  the  identification.  The  payoff  scheme  used  in  the  preliminary  set  was  applied. 


Findings: 

1.  Checkers  typically  improved  on  the  confidence  validity  of  interpreters  who  were 
poor  or  only  moderately  good  in  stating  confidence. 

2.  Checker  confidence  statements  in  identifications  made  by  interpreters  with  an 
"excellent"  record  of  confidence  validity  were  less  valid  than  those  of  the  initial  inter- 
preteis. 

3.  Interpreters'  confidence  statements  wero  more  valid  when  they  were  checking 
than  when  they  were  stating  confidence  in  their  own  identifications. 

4.  Checkers'  confidence  statements  were  more  affected  by  observed  variations  in 
the  identification  accuracy  of  the  initial  interpreter  than  by  his  confidence  validity. 

5.  Knowledge  of  their  own  initial  identification  and  confidence  proficiency  did  not 
affect  checker  performance. 


Utilization  of  Findings: 

In  team  operations,  confidence  statements  made  by  initial  interpreters  who  have 
excellent  records  of  estimating  the  probability  that  their  identifications  are  correct 
should  be  allowed  to  stand. 

The  checkers'  confidence  statements  are  to  be  preferred  when  initial  statements  are 
supplied  by  interpreters  whose  past  performance  in  making  such  statements  is  poor  or 
only  moderately  good. 
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CHECKER  CONFIDENCE  STATEMENTS  AS  AFFECTED  BY  PERFORMANCE  OF  INITIAL  IMAGE 
INTERPRETER 


I 

\  Image  in'-erpreter  teams  In  which  one  man  checks  the  reports  of 

another  have  generally  yielded  more  accurate  and  complete  information 
than  the  average  interpreter  working  alone However,  in  some  instances, 
teams  have  not  shown  any  improvement.^  An  Individual  Interpreter's  per¬ 
formance  may  even  suffer  as  a  result  of  his  being  part  of  a  team.  It 
was  found  that  the  better  the  initial  Interpreter,  the  less  the  improve¬ 
ment  resultant  from  adding  a  checking  Interoreter;  and  conversely,  the 
better  the  checking  interpreter,  the  greater  the  improvement.'^ 

Experiments  on  interpreter /checker  performance  have  generally  con¬ 
centrated  on  completeness,  accuracy,  and  timeliness  as  measures  of  indi¬ 
vidual  and  team  proficiency.  With  the  advent  of  computerized  intelli¬ 
gence  systems,  emphasis  on  techniques  for  processing  probabilistic  infor¬ 
mation  has  grown.  Recognition  that  interpreter  identifications  of  dis¬ 
persed  and  concealed  enemy  targets  can  seldom  be  made  with  100^  certitude 
has  led  to  study  of  the  interpreter's  confidence  in  his  findings  and  its 
operational  use  in  assessing  the  probability  that  given  identifications 
are  in  fact  correct.  Use  of  the  Initial  Interpreter's  confidence  in  his 
Identifications  to  help  determine  which  identifications  a  checker  should 
examine  has  been  explored.^  However,  the  direct  effect  of  the  initial 
interpreter's  confidence  on  the  checker's  own  accuracy  and  confidence 
has  not  been  systematically  studied.  In  view  of  the  Impact  that  sugges¬ 
tive  information  can  have  on  Interpreter  performance,  a  study  was  under¬ 
taken  to  evaluate  checker  performance  when  the  intelligence  Information 
he  is  checking  is  furnished  by  interpreters  with  varying  records  of 
accuracy  of  identifications  and  confidence  statements. 


^  Sadacca,  R.,  H.  Martlnek,  and  A.  I.  Schwartz.  Image  Interpretation 
Task--Status  Report.  Technical  Research  Report  II29.  U.  S.  Army 
Behavioral  Science  Research  Laboratory.  June  1962. 

Bolin,  S.  F.,  R.  Sadacca,  and  H.  Martlnek.  Team  procedures  in  image 
interpretation.  Technical  Research  Note  164.  U.  S.  Army  Behavioral 
Science  Research  Laboratory.  December  1965* 

-^Doten,  G.  W.  and  R.  Sadacca.  Team  interpretation  procedures;  Selec¬ 
tion  of  teammates  and  role  assignment.  Technical  Research  Note  201. 

U.  S.  Army  Behavioral  Science  Research  Laboratory.  January  1969* 

^Doten,  G.  W.,  J.  T.  Cockrell,  and  R.  Sadacca.  The  use  of  teams  in 
image  interpretation;  Information  exchange,  confidence,  and  resolving 
disagreements.  Technical  Research  Report  II5I.  U.  S.  Army  Behavioral 
Science  Research  Laboratory.  October  I966. 


SPECIFIC  OBJECTIVE 


N 

The  specific  objective  of  the  present  study  was  to  determine  how 
different  levels  of  Identification  accuracy  and  of  confidence  validity 
associated  with  an  Initial  interpreter  affect  the  confidence  validity 
of  the  checker.  Of  secondary  Interest  was  whether  checkers  supplied 
with  some  knowledge  of  their  own  prior  Identification  and  confidence 
performance  would  be  affected  differently. 

s 

METHOD 

Experimental  Subjects 

Eighteen  interpreters  recently  graduated  from  the  image  interpre¬ 
tation  course  at  the  U.  S.  Army  Intelligence  School,  Fort  Holablrd, 
Maryland,  participated  as  subjects.  All  had  met  the  school's  entrance 
requirement  of  a  score  of  100  or  above  in  the  General  Technical  Aptitude 
Area  (composite  of  Verbal  and  Arithmetic  Reasoning  tests). 

Experimental  Materials 

Stimulus  Imagery  consisted  of  four  sets  of  tO  annotated  targets. 
Sets  were  carefully  matched  on  target  type,  photo  quality,  scale,  and 
level  of  concealment;  for  example,  each  set  had  exactly  the  same  number 
of  5/4-ton  trucks  at  good  ^mage  quality,  1:5000  scale,  and  partial  con¬ 
cealment.  However,  the  ordering  of  target  type  within  each  Imagery  set 
depended  upon  the  position  of  the  targets  In  the  roll  of  imagery  and  was 
not  Identical  for  all  sets.  A  list  of  potential  target  names  as  given 
to  the  Interpreters  appears  In  Figure  1. 


Figure  1.  Target  List 


TARGET  LIST 


Armored  Personnel  Carrier  (APC) 

Howitzer  SP-IO5 
Howitzer  SP-155 
Howitzer  Towed  -  IO5 
Howitzer  Towed  -  155 

Shelter  -  Canvas 


Tank  -  M-41 
Tank  -  M-48 


Pup  Tent 

Command  Post  (CP)  Tent 

General  Purpose  (GP)  Tent  -  Medium 

General  Purpose  (GP)  Tent  -  Large 


Cargo  Trailer  1/4 
Cargo  Trailer  5/4 
Cargo  Trailer 
Water  Trailer  1^ 
Ammunition  Trailer  2 
Semi-Trailer  (low  Bed) 

Truck  1/4 
Truck  5/4 
Truck  2k 
Dump  Truck  2k 
Truck  5 

Bulldozer 

Tractor 

Wrecker 

Civilian  Vehicle 


Radar  Antenna 
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Experimental  Design 


The  four  sets  of  imagery,  assumed  to  be  equivalent,  were  randomly 
assigned  to  each  task  requirement.  One  set  was  used  in  an  initial 
performance  test  which  required  subjects  to  supply  identifications  and 
confidence  statements  for  each  annotation.  Target  identifications  were 
developed  for  each  of  the  three  remaining  sets  of  Imagery.  Of  the  60 
-nnotations  in  each  set,  percentages  correct  were  25^,  50^)  and 
hxsidentif icatlons  usually  named  a  target  type  likely  to  be  confused 
with  the  target  shown.  Next,  a  hypothetical  statement  of  confidence  was 
assigned  to  each  identification.  Each  set  of  confidence  estimates  in¬ 
cluded  six  at  and  three  at  all  other  5^  steps  ranging  from  5^  to  95^ 
(60  vronfidence  values  in  all). 

The  three  sets  of  confidence  statements  were  associated  with  appro¬ 
priate  target  identifications  to  provide  three  levels  of  confidence  va- 
lidity--square  of  the  biserlal  coefficient  of  correlation  between  con¬ 
fidence  and  accuracy  of  identificatlon--equal  to  .00,  .40,  and  .80. 

Nine  sets  of  identifications  and  associated  confidence  statements  were 
thus  generated  to  represent  all  combinations  of  the  three  levels  of 
identification  accuracy  and  the  three  levels  of  confidence  validity. 
Experimental  conditions  for  each  subject  were  fixed  by  random  assignment 
to  a  row  position  in  a  5  x  5  Graeco-Latin  square  (Figure  2). 

Procedure 

Preliminary  Phase .  To  obtain  individual  measures  of  base  perfor¬ 
mance  for  comparison  with  team  performance,  interpreters  in  the  sample 
were  asked  to  identify  the  annotated  targets  in  the  first  set  of  60  tar¬ 
gets  and  to  state  their  confidence  in  each  identification.  They  were 
instructed  to  state  their  feeling  of  confidence  in  light  of  a  special 
payoff  scheme  designed  to  discourage  overstatement  or  understatement  of 
their  actual  confidence.  The  values  used  are  shown  in  Table  1;  a  100- 
point  penalty  was  threatened  for  each  misidentlf ication.  The  payoff 
function  and  rationale  for  use  of  the  procedure  in  the  present  study  are 
explained  in  Appendix  B.  The  integration  of  the  payoff  scheme  into  the 
experimental  procedure  is  elaborated  in  the  Instructions  to  interpreters 
(Appendix  A) . 

After  the  preliminary  phase,  half  the  subjects  were  given  a  key  to 
the  ground  truth  of  the  annotated  targets  and  asked  to  review  each  anno¬ 
tation,  marking  each  of  their  identifications  as  correct  or  Incorrect. 
They  were  then  individually  shown  how  effectively  they  had  used  the  pay¬ 
off  scheme  in  accordance  with  their  own  responses.  The  other  half  re¬ 
ceived  no  feedback. 
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Experimental  Phase.  In  the  experiment  proper,  each  interpreter 
was  given  three  sets  of  annotated  imagery.  With  each  set  he  was  given 
identifications  and  associated  confidence  statements  attributea  to  a 
previous  interpreter.  He  was  instructed  to  examine  each  annotated  tar¬ 
get  and  check  the  Initial  Identification  and  associated  confidence  state¬ 
ment  and  then  to  state  his  own  confidence  in  the  given  identification. 

He  was  not  to  modify  the  identification.  He  was  told  that  the  point  pay¬ 
off  scheme  used  in  the  preliminary  phase  of  the  experiment  would  also  be 
invoked  here  to  evaluate  his  own  confidence  statements  and  to  compare 
them  with  those  of  the  initial  interpreter.  Instructions  to  the  inter¬ 
preters  appear  in  Appendix  A. 


Dependent  Variables 

Listed  below  are  seven  measures  derived  from  the  data  and  used  in 
the  analyses.  Each  was  computed  separately  for  each  set  of  60  responses. 
The  first  five  measures  are  also  meaningfully  defined  across  the  responses 
of  each  of  the  nine  hypothetical  initial  interpreters,  with  initial  in¬ 
terpreter  values  for  Identification  accuracy  and  confidence  validity  ser¬ 
ving  as  the  principal  independent  variables.  Variables  6  and  7  measure 
the  relationship  between  performance  of  the  initial  interpreter  and  that 
of  the  checker. 

1.  Identification  accuracy.  Number  of  correct  (to  target  type  and 
model  or  size)  identifications  divided  by  the  total  number  of  identifica¬ 
tions  (60)  expressed  as  per  cent. 

2.  Confidence  validity.  Square  of  biserial  correlation  between 
confidence  statement  and  correctness  of  the  identification. 

3.  Point  score.  Mean  number  of  payoff  points  achieved  by  inter¬ 
preter.^ 

4.  Inappropriate  confidence.  Number  of  times  confidence  in  an 
incorrect  Identification  was  greater  than  50^  plus  number  of  times  con¬ 
fidence  in  a  correct  identification  was  less  than  30^* 

5.  Confidence  spread.  Tendency  to  make  very  high  or  very  low  con¬ 
fidence  statements. 


Formula: 


60 

E 

i-1 


(c^-50)  ^ 

60 


where  c^  ■  confidence  that  identification  1  is  correct. 


^ No  points  were  actually  subtracted  for  incorrect  identification  as 
threatened  in  the  preliminary  phase. 
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Table  1 

PAYOFF  VALUES  USED  IN  EXPERIMENT 


Level  of  Confidence 
that  Identification 
is  Correct 


Point  Credits  if 
Identification  is 
Correct 


Point  Credits  if 
Identification  is 
Incorrect 


6.  Checker/initial  Interpreter  relationship.  Correlation  between 
checker's  confidence  statements  and  those  attributed  to  initial  interpre¬ 
ters  (  z-t rans formation)  . 

7.  Checker/initial  interpreter  relationship  with  accuracy  of  identi¬ 
fication  partialed  out  ( z-transformation) . 

RESULTS  AND  INTERPRETATION 

Table  2  summarizes  preliminary  phase  data  for  interpreter  perform¬ 
ance  without  knowledge  of  "previous"  identifications  or  confidence  state¬ 
ments.  Data  were  analyzed  to  describe  differences  in  performance  between 
average  experimental  interpreter  and  hypothetical  Initial  interpreters 
and  to  get  an  idea  of  relationships  among  dependent  variables.  Mean 
identification  accuracy  of  falls  between  the  first  two  levels  of 

hypothetical  interpreter  identification  accuracy,  and  respective¬ 

ly.  Mean  confidence  validity  of  .I5  falls  between  the  first  two  levels 
of  hypothetical  Interpreter  confidence  validity,  .00  and  .40,  respectively. 
These  contrasts  permit  the  established  levels  of  hypothetical  initial  in¬ 
terpreter  identification  accuracy  and  confidence  validity  to  be  reasonably 
labeled  (for  future  reference)  as  relatively  poor,  good,  and  excellent, 
respectively.  Rather  poor  confidence  performance  during  the  preliminary 
phase  is  also  reflected  in  the  low  mean  point  score  (68)  and  high  mean 
inappropriate  confidence  (24).  In  fact,  had  interpreters  stated  con¬ 
fidence  for  every  response  they  would  have  obtained  a  higher  point  score 
(75)  •  A  mean  confidence  spread  of  995  Indicates  greater  use  of  confidence 
values  near  0  or  100  than  was  attributed  to  the  hypothetical  Interpreters. 
In  regard  to  the  intercorrelations  among  variables,  it  is  not  surprising 
that  confidence  validity,  point  score,  and  Inappropriate  confidence  inter- 
correlated  significantly  since  all  three  were  intended  to  measure  the  de¬ 
gree  of  correspondence  between  confidence  and  ground  truth.  From  the 
significant  correlation  coefficients  obtained  for  each  of  these  three 
variables  with  identification  accuracy,  subjects  with  superior  identifi¬ 
cation  accuracy  also  gave  superior  confidence  performance. 

An  analysis  of  variance  was  performed  on  each  dependent  variable  in 
the  experimental  phase,  and  the  significant  F-ratios  are  given  in  Table  5* 
For  no  variable  did  feedback  at  the  end  of  the  preliminary  phase  prove 
to  be  a  significant  effect.  For  all  dependent  variables  but  one,  signi¬ 
ficant  differences  were  found  only  for  the  main  effects  of  major  interest: 
identification  accuracy/imagery  set  and  confidence  validity.  (Because 
of  the  care  taken  to  match  imagery  sets,  the  Identification  accuracy/ 
imagery  set  effects  are  assumed  to  be  due  mainly  to  differences  in  iden¬ 
tification  accuracy  as  opposed  to  imagery  variations.)  Mean  Initial 
levels  of  identification  accuracy  and  confidence  validity  for  the  checkers 
are  presented  in  Table  4  in  comparison  with  the  means  established  for 
the  three  initial  interpreters  at  each  level  of  Identification  accuracy 
and  confidence  validity. 
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On  the  three  measures  of  confidence  performance--confidence  validity, 
point  score,  and  inappropriate  conf idence--checkers  were  generally  able 
to  improve  more  substantially  on  mean  initial  interpreter  performance  in 
the  case  cf  a  poor  or  good  interpreter  than  in  the  case  of  an  excellent 
Interpreter.  Checker  improvement  over  initial  interpreter  performance 
showed  a  general  decline  as  the  level  of  initial  performance  Increased. 

In  fact,  a  degradation  in  performance  was  observed  for  confidence  valid¬ 
ity  a^  the  highest  level  of  initial  confidence  validity  and  for  point 
score  at  the  highest  levels  of  both  initial  identification  accuracy  and 
confidence  validity.  However,  checker  means  for  confidence  validity, 
point  score,  and  inappropriate  confidence  indicated  far  superior  per¬ 
formance  than  was  observed  for  the  same  variables  during  the  preliminary 
phase.  Checker  performance  on  these  variables  was  better  at  the  higher 
levels  of  initial  confidence  validity;  however,  performance  was  best  at 
the  lowest  level  of  initial  identification  accuracy,  25^.  Although  the 
analysis  of  variance  design  did  not  permit  the  recovery  of  a  term  for 
interaction  between  initial  identification  accuracy  and  confidence  valid¬ 
ity,  intuition  would  suggest  that  some  kind  of  interaction  was  present. 

By  design,  confidence  spread  was  identical  across  the  hypothetical 
initial  interpreters  and  equal  to  712. 5.  Checker  confidence  spread 
decreased  v;lth  Increasing  initial  identification  accuracy,  indicating 
that  checkers  made  more  extreme  confidence  statements  when  reviewing 
the  responses  of  a  less  accurate  Interpreter.  Mean  confidence  spread 
during  checking  was  far  larger  than  when  Interpreters  were  assigning 
confidence  to  their  own  identifications.  Interpreters  were  apparently 
more  willing  to  state  extreme  confidence  in  an  identification  made  by 
someone  else. 

The  obtained  relationships  between  confidence  performance  of  checkers 
and  hypothetical  initial  interpreters  are  shown  in  Table  5*  Mean  values 
suggest  greater  acceptance  of  initial  confidence  statements  when  the  rate 
of  identification  accuracy  was  observed  to  be  more  or  less  distinct  (25^ 
or  75^)  than  when  observed  to  be  chance  (50^).  Although  considerably 
lower,  partial  relationship  between  checker  and  initial  interpreter  re¬ 
mained  significantly  different  and  in  th'^  same  direction  across  the  three 
initial  accuracy  levels.  Checker  confidence  validity  clearly  Increased 
with  initial  interpreter  confidence  validity;  however,  that  the  increase 
was  for  the  most  part  attributable  to  the  perceived  correctness  or  incor¬ 
rectness  of  the  identifications  is  shown  by  the  nonsignificant  differences 
obtained  when  accuracy  of  identification  was  partlaled  out.  Checkers  in 
stating  their  own  confidence  were  generally  more  influenced  by  the  initial 
interpreter's  overall  accuracy  rate  than  by  tue  impact  of  his  confidence 
for  an  individual  target  Identification. 

Of  supplementary  Interest  is  the  .finding  that  checker/initial  inter¬ 
preter  partial  relationship  values  decreased  slmificantly  after  the  first 
checking  session:  session  means  were  .555#  ‘iTo#  -iSo  (P  <  .O5). 

Thus,  checker  tendency  to  rely  on  initial  Interpreter  confidence  declined 
with  task  experience. 

Overall,  the  results  point  to  the  following  general  explanation: 
Checkers  tended  to  augment  the  initial  Interpreter's  confidence  value  when 
the  checker  perceived  the  identification  to  be  correct  and  to  reduce  it 
when  he  perceived  it  to  be  Incorrect;  in  each  case,  however,  the  checker 
exercised  temperance.  The  amount  of  temperance  was  far  more  for  perceived 
correct  identifications  than  for  perceived  incorrect  identifications. 

That  is,  checkers  tended  to  use  a  more  extreme  confidence  statement  when 
in  disagreement  with  the  identification. 
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SIGNIFICANT  F-RATIOS  FOR  DEPENDENT  VARIABLES 
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CONCLUSIONS  AND  DISCUSSION 


The  conclusions  of  principal  interest  for  team  interpretation  methods 

are : 


1.  A  checker  can  usually  improve  on  the  confidence  validity  of  an 
initial  interpreter  who  is  relatively  poor  or  good  in  making  confidence 
statements,  but  most  checkers  will  degrade  the  confidence  validity  of  an 
excellent  confidence  assessor. 

2.  The  confidence  validity  of  an  interpreter  when  he  is  performing 
a  checking  function  is  considerably  above  the  validity  of  his  confidence 
in  his  own  identifications. 

5.  In  general,  in  checking  confidence  statements,  a  checker  is  more 
sensitive  to  initial  interpreter  variations  in  identification  accuracy 
than  to  variations  in  confidence  validity. 

The  first  finding  is  consonant  with  results  from  other  team  method 
studies  which  indicate  that  the  better  the  initial  interpreter  the  less 
the  gain  can  be  expected  through  employing  a  team  method.  However,  the 
low  mean  initial  confidence  validity  of  . I7  obtained  in  the  preliminary 
phase  of  the  experiment  indicates  that  the  confidence  statements  assigned 
by  most  interpreters  could  stand  considerable  improvement.  That  checkers 
were  less  sensitive  to  confidence  validity  than  to  identification  accuracy 
rates  is  not  surprising  considering  the  greater  emphasis  placed  on  accuracy 
on  the  job  and  in  training.  A  "halo"  effect  may  also  be  in  operation. 
Perceiving  the  initial  interpreter's  accuracy  rate  to  be  no  better  than 
chance,  the  checker  may  tend  to  ignore  the  validity  of  his  confidence 
statements . 

Of  secondary  Interest  is  the  finding  that  the  extra  training  afforded 
the  interpreters  receiving  feedback  apparently  did  not  have  any  effect  on 
subsequent  performance.  The  payoff  scheme  represented  a  new  response  mode 
for  the  interpreters.  Because  of  the  short  task  duration,  it  is  doubtful 
that  more  than  a  few  came  to  understand  its  operation.  Giving  feedback 
after  each  response  and  not  after  a  large  block  of  responses  as  was  dene 
here  might  have  had  greater  impact. 
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APPENDIX  A 


INSTRUCTIONS  TO  INTERPRETERS  FOR  PRELIMINARY  "HASE  OF  EXPERIMENT 

In  the  first  phase  of  this  experiment,  you  are  to  examine  60  annotated 
images  all  of  which  are  actual  targets.  Your  task  is  to  identify  each  tar¬ 
get.  On  a  separate  sheet  is  printed  a  list  of  targets  from  which  you  can 
choose;  the  target  name  you  assign  must  appear  on  this  list.  However,  please 
note  that  this  is  a  general  list  of  "possible"  targets.  Some  of  the  items 
listed  may  not  be  among  those  you  will  be  looking  at.  It  is  very  important 
that  you  include  in  your  identification  the  type  and/or  size  of  the  target 
when  more  than  one  type  or  size  appears  on  the  target  list.  To  illustrate, 
the  response  "Tank"  will  not  be  accepted;  it  must  be  "Tank  -  M-41"  or  "Tank 
-  M-48."  "Truck"  is  not  acceptable;  it  must  be,  for  example,  "Truck  1/4," 
"Dump  Truck  2  1/2,"  etc. 

In  addition  to  the  identification  we  would  like  to  know  how  confident 
you  are  chat  your  identification  is  correct.  You  are  to  use  a  confidence 
scale  that  runs  from  0  to  100,  where  100  indicates  that  you  are  certain 
your  identification  is  correct.  If  you  use  this  scale  accurately,  all  of 
the  identifications  for  which  you  indicate  100^  confidence  should  be  cor¬ 
rect,  80^  of  the  identifications  for  which  you  indicate  80^  confidence 
should  be  correct,  50^'  of  the  identifi  atlons  for  which  you  indicate 
confidence  should  be  correct,  and  so  forth.  You  can  use  0,  05,  10,  15, 

20,  ...  75,  80,  85,  90,  95,  100  to  indicate  your  estimate  of  the  probability 
that  you  have  made  a  correct  identification. 

From  previous  experiments  we  have  found  that  an  Interpreter's  state¬ 
ments  of  confidence  in  his  identifications  are  very  important  in  evaluating 
the  accuracy  of  his  identifications;  so  try  to  be  as  accurate  as  possible. 

To  help  prevent  you  from  over-  or  underestimating  your  degree  of  confidence, 
we  are  going  to  use  a  table  of  payoff  credits  specially  designed  to  score 
the  appropriateness  of  your  confidence  measures.  If  you  look  at  the  payoff 
sheet,  you  see  three  separate  columns.  In  the  first  column  of  the  table 
are  listed  confidence  levels  from  100  to  0  at  5^  intervals.  In  the  second 
and  third  columns  are  listed--corresponding  to  the  confidence  level --the 
number  of  cr<.-iits  or  points  you  will  win  if  the  particular  identification 
being  judged  is  correct,  and  the  number  of  points  you  will  win  if  it  is 
Incorrect.  You  may  observe  that  the  more' confident  you  are  that  a  given 
identification  is  cori  ect,  the  more  points  you  will  win  if  it  indeed  is 
CORRECT .  and  the  less  confident  you  are  that  it  is  correct  the  more  points 
you  will  win  if  it  indeed  is  INCORRECT.  For  example,  if  you  are  100^  con¬ 
fident  of  an  identification,  you  will  get  100  points  if  it  is  correct  but 
you  will  get  nothing  if  it  is  wrong.  If  you  are  75^  confident,  you  will 
get  94  points  if  you  are  correct  and  44  points  if  you  are  incorrect.  When 
you  are  50^  sure  about  an  identification,  you  imply  that  you  are  equally 
confident  of  being  correct  as  you  are  of  being  wrong.  Therefore,  at  the 
50']t  level  of  confidence  you  will  get  the  same  number  of  points  whether  you 
are  right  or  wrong,  namely  75*  If  you  are  25^  confident,  you  will  get 
44  points  if  you  are  correct  and  94  points  if  you  are  incorrect . 


Notice  that  when  you  are  0^  confident  about  an  identification  and 
you  prove  wrong  the  payoff  table  says  you  are  entitled  to  the  maximum 
number  of  point8--100.  But  we  are  interested  in  the  accuracy  of  your 
identification  as  well  as  in  your  ability  to  estimate  confidence.  There¬ 
fore,  in  Phase  I  of  this  experiment,  for  every  incorrect  identification 
you  will  be  penalized  100  points.  If  you  are  0^  confident  about  a  wrong 
identification,  you  will  get  100  points  according  to  the  payoff  but  will 
lose  100  for  being  wrong  ao  you  will  wind  up  with  no  points  at  all.  It 
should  therefore  be  clear  that  you  have  absolutely  nothing  to  gain  if  you 
misidentify  targets  and  aaslgr  a  low  probability  to  the  misidentiflcation. 

In  summary,  the  more  honest  you  are  about  your  level  of  confidence, 
the  mori?  points  you  stand  to  win.  The  points  that  you  accumulate  for 
each  identification  will  be  summed  and  at  the  end  of  the  experiment  you 
will  be  provided  with  your  total  score.  You  will  also  be  given  a  state¬ 
ment  as  to  how  well  you  did  in  comparison  with  the  other  interpreters 
who  participated  in  the  experiment.  So  please  try  to  get  as  high  a  score 
as  possible. 

Blank  responses  are  unacceptable.  You  must  write  down  an  identifi¬ 
cation  for  every  annotation. 

Are  there  any  questions? 
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V 


INSTRUCTIONS  FOR  PERFORMANCE  FEEDBACK 

Before  giving  you  a  key  for  the  correct  identifications,  we  want  to 
give  you  some  feeling  tor  how  well  you  are  estimating  the  probability 
that  your  identifications  are  correct.  The  best  way  for  us  to  accomplish 
this  is  to  have  you  score  your  identifications  and  related  confidence 
measures  in  accordance  with  the  Table  of  Payoff  Credits. 

It  is  essential  for  the  purposes  of  this  research  that  you  cooperate 
fully  and  honestly  in  scoring  your  own  answer  sheet.  Also,  you  are 
expected  to  gain  an  accurate  understanding  of  how  the  Payoff  Table  in¬ 
fluences  your  level  of  confidence  so  as  to  allow  you  to  become  a  better 
probability  estimator  in  the  sessions  jo  follow.  We  will  proceed  as 
follows : 

First,  I  will  read  off  the  list  of  correct  identifications  for  the 
GO  annotations.  Listen  to  each  correct  target  name  carefully,  and  then 
if  your  answer  is  correct  mark  a  "C"  in  the  column  headed  C/l;  if  your 
answer  is  incorrect  mark  an  "I"  in  that  same  column.  Your  identification 
must  be  precisely  correct.  As  an  example,  if  the  right  answer  is  "truck 
-  5/4",  you  must  have  "truck  -  5/4"  to  get  a  "C" ;  if  you  have  listed 
"truck  -  1/4",  you  get  an  "I".  Let's  do  that  now. 

For  each  annotation,  you  now  should  have  either  a  "C"  or  an  "1". 

To  score  each  response,  look  at  the  value  for  your  confidence,  find  this 
value  in  column  one  of  your  Payoff  Table,  then  select  the  corresponding 
number  of  point  credits  in  column  "C"  if  your  response  was  correct  or  in 
column  "I"  if  your  response  was  Incorrect.  Write  the  resulting  number  in 
the  column  on  your  response  sheet  labeled  PT.  Do  this  for  each  one  of 
the  60  responses.  Return  to  your  response  sheets  and  make  a  small  x  to 
the  right  of  every  PT  box  for  which  the  point  value  is  less  than  75*  for 
every  response  which  you  now  have  an  x  to  the  right  of  the  PT  box,  it 
means  that  you  were  either  less  than  50^  confident  of  what  turned  out  to 
be  a  correct  response  or  more  than  50^  confident  of  what  turned  out  to  be 
an  Incorrect  response.  If  you  look  at  the  Payoff  Table  for  each  x'ed 
response,  you  see  how  many  points  you  won  and  how  many  you  could  have  won 
if  the  outcome  of  your  response  had  been  more  in  line  with  your  expressed 
level  of  confidence.  Let  us  give  some  examples: 

If  you  were  50^  confident  on  what  turned  out  to  be  a  correct  response, 
you  only  got  ^1  points,  whereas  you  could  have  got  9I  points  if  you  had 
been  70^  confident  about  the  response;  the  difference  is  40  points,  which 
in  this  example  represents  the  penalty  you  payed  for  underestimating  your 
confidence.  If  you  were  80^  confident  of  what  turned  out  to  be  an 
incorrect  response,  you  got  only  56  points  when  you  could  have  got  96 
points  if  you  had  been  20^  confident  about  the  response;  the  difference 
is  60  points,  which  in  this  example  represents  the  penaltv  you  payed 
for  overestimating  your  confidence.  Familiarize  yourself  with  the  impact 
of  such  point  differences  for  every  x'ed  response,  that  is,  for  every 
response  for  which  you  greatly  misjudged  your  level  of  confidence. 
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By  carefully  following  our  Instructions  for  learning  about  the  pro¬ 
perties  of  the  Payoff  Table,  it  should  become  very  clear  to  you  that  the 
best  thing  for  you  to  do  is  to  always  respond  with  a  confidence  that 
honestly  reflects  how  you  feel  about  the  particular  annotation. 

You  are  now  to  proceed,  with  the  help  of  a  key  for  the  correct  iden¬ 
tifications,  to  re-examine  each  of  the  annotations.  Pay  special  atten¬ 
tion  to  those  for  which  your  response  was  marked  with  an  x,  that  is, 
those  for  which  your  level  of  confidence  was  inappropriate. 

You  need  not  tally  up  your  total  point  credits.  We  will  do  that 
for  you,  and  at  the  same  time  we  will  subtract  100  points  for  each  mls- 
identif Ication.  You  will  get  the  results  at  the  end  of  the  experiment. 
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INSTRUCTIONS  FOR  EXPERIMENT  PROPER 


Your  task  now  will  be  to  examine  a  different  set  of  60  annotated 
targets  from  the  same  role  of  Imagery.  This  set  has  already  been  Inter¬ 
preted  by  an  Image  Interpreter  from  a  previous  graduating  class.  For 
each  annotation,  this  Interpreter  selected  a  few  specific  target  names 
from  the  target  list.  The  Interpreter  has  also  assigned  to  each  target 
name  a  level  of  confidence  that  the  ti.i'get  name  Is  correct.  For  each 
annotation,  you  will  be  given  one  of  the  target  names  selected  by  the 
Image  Interpreter  together  with  the  level  of  confidence  assigned  to  It. 

It  Is  very  Important  that  you  understand  that  the  target  name  listed  Is 
not  necei<sarlly  the  one  which  the  Interpreter  thought  was  most  probable. 

For  example,  If  the  Interpreter  was  50^  confident  that  It  was  a  1/4  ton 
truck,  he  may  have  been  60^  confident  that  It  was  a  1/4  ton  trailer. 

Very  often,  however,  the  target  name  will  have  been  his  first  choice. 

In  fact,  whenever  the  expressed  level  of  confidence  Is  greater  than 
this  means  that  the  man  was  more  confident  of  the  listed  target  name 
than  of  any  other.  The  60  particular  annotations  that  you  will  observe 
have  been  selected  from  a  much  larger  set  Interpreted  by  the  same  man  In 
a  way  that  gives  a  good  sample  of  the  Interpreter's  confidence  estlmatec. 

Your  task  Is  as  follows.  Look  at  the  annotated  object  and  then  at 
the  Identification  and  assigned  confidence  made  by  the  previous  Inter¬ 
preter.  Then,  In  the  appropriate  space  on  the  response  sheet  state  your 
own  confidence  that  the  annotated  target  Is  In  truth  what  the  roan  reported 
It  to  be.  In  other  words.  If  the  man  said  It  had  probability  of  50^  of 
being  an  APC,  tell  us  what  you  think  the  probability  Is  that  It  Is  an  APC. 
Your  personal  level  of  confidence  may  be  similar  to  or  very  different  from 
that  of  the  previous  man;  It  may  be  higher  or  lower.  To  use  the  same 
example.  If  you  are  very  confident  that  the  target  In  question  Is  not  an 
APC,  then  simply  assign  a  very  low  probability  to  It.  You  are  always  to 
estimate  the  probability  that  the  specific  annotation  Is  actually  the  tar¬ 
get  Identification  listed  on  your  response  sheet.  You  are  not  required 
to  provide  any  alternative  target  names  for  any  of  the  annotations.  Since 
In  this  phase  of  the  experiment  you  cannot  make  a  mlsldentlflcatlon,  you 
will  not  be  penalized  as  you  were  In  Phase  I.  That  Is,  100  points  will 
not  be  subtracted  for  any  mlsldentlflcatlon.  However,  your  estimated  con¬ 
fidences  will  be  st'ictly  scored  according  to  the  same  payoff  cable 
employed  In  Phase  I.  Therefore,  please  try  to  be  very  rccurate  with  your 
own  confidence  judgments. 

After  you  complete  the  first  set  of  60  annotations  In  Phase  II,  you 
will  be  presented  with  another  batch  of  60  annotations  together  with  a  set 
of  corresponding  responses  collected  from  those  of  a  different  Image  Inter¬ 
preter.  The  task  procedure  will  be  the  same  as  explained  above.  Finally, 
you  will  be  asked  to  respond  In  the  same  way  to  another  set  of  similar 
Image  materials  arranged  from  the  responses  of  a  third  Interpreter. 
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The  man  whose  responses  you  are  considering  is  identified  by  a 
number  on  the  top  of  each  respective  response  sheet.  Be  sure  that  you 
are  working  with  a  different  men  number  in  each  of  the  three  sessions 
here  in  Phase  II.  Remember  that  in  all  sessions  your  own  confidence 
will  be  scored  according  to  the  payoff  table.  Your  score  will  be  com¬ 
pared  to  that  achieved  by  the  interpreter  you  are  checking  to  see  who 
was  more  accurate  so  try  to  estimate  your  confidence  as  accurately  as 
possible. 

After  you  complete  a  set  of  60  annotations,  please  roll  the  imagery 
back  to  photo  no.  1.  For  each  man,  you  must  examine  the  60  annotations 
in  order  from  1  to  60;  you  are  not  permitted  to  go  BACKWARDS. 

Are  there  any  questions? 
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Figure  A-1 .  Sample  Response  Sheet 


APPENDIX  B 


RATIONALE  FOR  PAYOFF  FUNCTION 

After  an  Image  Interpreter  has  identified  a  target,  he  is  often 
asked  to  state  his  confidence  that  the  identification  is  indeed  correct. 
If  his  confidence  statement  is  to  have  operational  value,  it  is  important 
that  the  statement  accurately  reflect  his  true  feeling  of  confidence. 

In  most  experiments,  the  interpreter  is  simply  asked  to  state  a  level  of 
confidence,  but  this  method  can  be  criticized  on  grounds  that  there  is 
no  way  of  knowing  if  the  stated  confidence  matches  true  confidence.  For 
various  implicit  and/or  explicit  reasons  that  depend  upon  the  personality 
of  the  interpreter  and  the  given  task,  interpreters  often  tend  to  "hedge" 
their  confidences;  i.e.,  they  may  either  overstate  or  understate  true 
confidence  if  they  see  a  particular  advantage  in  doing  so. 

Several  payoff  schemes  have  therefore  been  developed  to  encourage 
honest  statement  of  confidence  (subjective  probability)^,  If  t  is 

true  confidence  and  c  is  stated  confidence,  then  these  functions  are 
alike  in  that  they  grade  a  reward/penalty  (usually  points)  for  each 
response  in  accordance  with  a  special  nonlinear  function  of  deviation  of 
c  from  t.  The  quadratic  payoff  function^  was  instrumented  in  this  study. 
The  linear  constraints  lor  the  function  were  adjusted  for  convenience 
to  make  payoff  credits  positive  with  range  from  0  to  100  (Table  1  of  the 
text).  For  a  correct  identification,  the  interpreter  was  awarded  100-100 
(1  -  c)^  points,  but  for  an  incorrect  identification,  he  was  rewarded 
lOO-lOOc^  points. 

It  is  necessary  to  show  that  in  terms  of  normative  decision  theory, 
it  is  the  subject's  best  strategy  to  always  state  his  confidence  accu¬ 
rately,  i.e,,  to  set  c  equal  to  t.  As  far  as  the  interpreter  is  concern¬ 
ed,  his  expected  number  of  points  for  any  response  is: 

t  x  [100  -  100(1  -  c)^l  +  (1  -  t)  X  [100  -  lOOc^l, 

or 

100  -  100  (c®  -  2tc  +  t). 


^  Toda,  M.  Measurement  of  subjective  probability  distribution.  Institute 
for  Research,  Division  of  Mathematical  Psychology.  Report  No.  5,  19G5. 
State  College,  Pennsylvania. 

^  Roby,  T.  B.  Belief  states,  evidence,  and  action.  In  Predecislonal 

processes  in  decision  making.  USA  Medical  Research  Laboratory  Technical 
Document  Report,  No.  64-77>  1964,  Behavioral  Sciences  Laboratory, 

Wright  Patterson  AFB. 

^  van  Naerssen,  R.  F.  A  scale  for  the  measurement  of  subjective  probabil¬ 
ity.  Acta  Psychologica.  1962,  159-166. 
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By  taking  the  partial  derivative  with  respect  to  c  and  setting  it  equal 
to  zero,  it  follows  that  expected  points  will  be  a  maximum  if  and  only 
if  c  equals  t.  In  summary,  the  closer  c  is  to  t,  the  more  the  interpre¬ 
ter  has  to  gain  expected  point-wise. 

During  the  preliminary  phase  of  the  experiment  where  interpreters 
made  their  own  identifications,  it  would  seem  that  the  Interpreter  could 
take  advantage  of  the  payoff  system  by  making  intentional  misidentlf Ica- 
tlons  and  assigning  very  low  confidence  to  them.  For  example,  suppose 
the  interpreter  is  fairly  confident  that  the  imaged  object  is  some  kind 
of  vehicle.  If  he  identifies  it  as  a  pup  tent,  knowing  that  this  is 
clearly  Incorrect,  but  states  0^  confidence,  then  he  would  get  100  points 
for  an  Incorrect  identification.  To  insure  against  this  undesired  possi¬ 
bility,  the  Interpreter  was  told  that  he  would  be  penalized  100  points 
for  each  incorrect  identification  (see  Instructions,  Appendix  A).  How¬ 
ever,  as  indicated  in  the  definitions  of  dependent  variables,  no  points 
were  actually  deducted  in  computing  point  score. 

In  addition  to  encouraging  the  subject  to  be  honest,  the  payoff 
function  served  as  a  means  of  measuring  confidence  performance  through 
mean  number  of  points.  Point  score  was  obtained  by  employing  the  same 
payoff  structure  to  score  each  confidence  statement  in  light  of  whether 
the  identification  was  correct  or  incorrect.  Properties  and  uses  of  the 
quadratic  and  similar  payoff  functions  as  scoring  rules  have  been  dis¬ 
cussed.^ 

The  theory  of  admissible  payoff  functions  for  subjective  probability 
measurement  calls  for  a  workable  integration  of  mathematical^  and  psycho¬ 
logical^  constructs.  The  success  of  these  measurement  methods  and  the 
need  for  their  incorporation  into  relevant  experiments  await  further  re¬ 
search.  However,  any  fair  test  of  the  efficacy  of  the  method  would 
strive  to  adhere  to  the  following  task  criteria: 

1.  The  response  mode  and  scoring  method  and  their  implications 
must  be  known  and  well  understood  by  the  interpreter.  Training  may  be 
required  to  impress  upon  the  interpreter  the  necessary  correspondence 
between  his  own  beliefs  and  the  numbers  into  which  these  must  be  trans¬ 
lated.'^ 


^Winkler,  R.  L.  The  quantification  of  judgment:  some  methodological 
suggestions.  Journal  of  American  Statistical  Association.  196?,  62. 
1105-1120. 

^  Shu ford,  E.  H.,  A.  Albert  and  H.  E.  Massenglll.  Admissible  probability 
measurement  procedures.  Psychometrika.  1966,  31.  125-147 . 

^  de  Flnetti,  B.  Methods  for  discriminating  levels  of  partial  knowledge 
concerning  9  test  items.  British  Journal  of  Mathematical  and  Statis¬ 
tical  Psychology,  1965,  l8j  87 -125. 

^  Such  training,  although  at  a  very  superficial  level,  was  attempted  in 
the  feidback  mode  after  the  preliminary  phase  of  the  present  experiment. 
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2.  The  task  must  be  so  structured  that  it  is  to  the  disadvantage 
of  the  interpreter  to  respond  in  a  manner  inconsistent  with  his  expec¬ 
tations.  Maximization  of  expected  points  on  each  trial  is  to  be  achieved 
by  making  c  congruent  to  t. 

5.  The  Interpreter  should  be  keenly  interested  in  maximizing  his  ex¬ 
pected  total  score,  each  point  added  to  the  score  having  equivalent  utility 
--either  moral  or  material. 

4.  The  method  of  measurement  must  be  operational,  efficient,  and 
practical . 
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